In this tutorial, we demo

  • How to do inference with MMSeg trained weight
  • How to train on your own dataset and visualize the results.

Install MMSegmentation

This step may take several minutes.

We use PyTorch 1.5.0 and CUDA 10.1 for this tutorial. You may install other versions by change the version number in pip install command.

# Check nvcc version
!nvcc -V
# Check GCC version
!gcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
gcc (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

# # Install PyTorch
# !pip install -U torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
# # Install MMCV
# !pip install mmcv-full==latest+torch1.5.0+cu101 -f https://download.openmmlab.com/mmcv/dist/index.html
# !rm -rf mmsegmentation
# !git clone https://github.com/open-mmlab/mmsegmentation.git 
# %cd mmsegmentation
# !pip install -e .
# Check Pytorch installation
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())

# Check MMSegmentation installation
import mmseg
print(mmseg.__version__)
1.7.0 True
0.19.0

Run Inference with MMSeg trained weight

# !mkdir checkpoints
# !wget https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth -P checkpoints
from mmseg.apis import inference_segmentor, init_segmentor, show_result_pyplot
from mmseg.core.evaluation import get_palette
config_file = '/home/ubuntu/sharedData/swp/dlLabSwp/favourite/swpFastTest/mmsegmentation/configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py'
checkpoint_file = 'checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'
# build the model from a config file and a checkpoint file
model = init_segmentor(config_file, checkpoint_file, device='cuda:0')
Use load_from_local loader
# test a single image
img = '/home/ubuntu/sharedData/swp/dlLabSwp/favourite/swpFastTest/mmsegmentation/demo/demo.png'
result = inference_segmentor(model, img)
# show the results
show_result_pyplot(model, img, result, get_palette('cityscapes'))
/home/ubuntu/miniconda3/envs/new/lib/python3.8/site-packages/mmseg/models/segmentors/base.py:264: UserWarning: show==False and out_file is not specified, only result image will be returned
  warnings.warn('show==False and out_file is not specified, only '

Train a semantic segmentation model on a new dataset

To train on a customized dataset, the following steps are neccessary.

  1. Add a new dataset class.
  2. Create a config file accordingly.
  3. Perform training and evaluation.

Add a new dataset

Datasets in MMSegmentation require image and semantic segmentation maps to be placed in folders with the same perfix. To support a new dataset, we may need to modify the original file structure.

In this tutorial, we give an example of converting the dataset. You may refer to docs for details about dataset reorganization.

We use Standord Background Dataset as an example. The dataset contains 715 images chosen from existing public datasets LabelMe, MSRC, PASCAL VOC and Geometric Context. Images from these datasets are mainly outdoor scenes, each containing approximately 320-by-240 pixels. In this tutorial, we use the region annotations as labels. There are 8 classes in total, i.e. sky, tree, road, grass, water, building, mountain, and foreground object.

# download and unzip
# !wget http://dags.stanford.edu/data/iccv09Data.tar.gz -O standford_background.tar.gz
# !tar xf standford_background.tar.gz

explore the Potsdam dataset

overview

# Let's take a look at the dataset
import mmcv
import matplotlib.pyplot as plt
from fastcore.basics import *
from fastai.vision.all import *
from fastai.torch_basics import *
import warnings
warnings.filterwarnings("ignore")
import kornia
from kornia.constants import Resample
from kornia.color import *
from kornia import augmentation as K
import kornia.augmentation as F
import kornia.augmentation.random_generator as rg
from torchvision.transforms import functional as tvF
from torchvision.transforms import transforms
from torchvision.transforms import PILToTensor

import matplotlib.pyplot as plt
import numpy as np

set_seed(105)
train_a_path = Path("/home/ubuntu/sharedData/swp/dlLab/fastaiRepository/fastai/data/rsData/kaggleOriginal/Potsdam/2_Ortho_RGB/")
label_a_path = Path("/home/ubuntu/sharedData/swp/dlLab/fastaiRepository/fastai/data/rsData/kaggleOriginal/Potsdam/5_labels_for_participants/")
dsm_path = Path("/home/ubuntu/sharedData/swp/dlLab/fastaiRepository/fastai/data/rsData/kaggleOriginal/Potsdam/1_dsm/1_DSM/")
ndsm_path = Path("/home/ubuntu/sharedData/swp/dlLab/fastaiRepository/fastai/data/rsData/kaggleOriginal/Potsdam/1_dsm_normalisation/1_DSM_normalisation/")
imgNames = get_image_files(train_a_path)
lblNames = get_image_files(label_a_path)
dsmNames = get_image_files(dsm_path)
# data
imgNames[0]= Path('/home/ubuntu/sharedData/swp/dlLab/fastaiRepository/fastai/data/rsData/kaggleOriginal/Potsdam/2_Ortho_RGB/top_potsdam_2_11_RGB.tif')
lblNames[0]= Path('/home/ubuntu/sharedData/swp/dlLab/fastaiRepository/fastai/data/rsData/kaggleOriginal/Potsdam/5_labels_for_participants/top_potsdam_2_11_label.tif')
dsmNames[0]=Path('/home/ubuntu/sharedData/swp/dlLab/fastaiRepository/fastai/data/rsData/kaggleOriginal/Potsdam/1_dsm/1_DSM/dsm_potsdam_02_11.tif')

img = mmcv.imread(imgNames[0],channel_order='rgb')
plt.figure(figsize=(8, 8))
plt.imshow(img)
plt.axis('off')
plt.show()
<Figure size 576x576 with 0 Axes>
<matplotlib.image.AxesImage at 0x7ff55a468550>
(-0.5, 5999.5, 5999.5, -0.5)
torch.cuda.empty_cache()
!nvidia-smi
Sat Nov 20 01:55:13 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:05:00.0 Off |                  N/A |
| 33%   56C    P2    65W / 250W |   2625MiB / 12194MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:09:00.0 Off |                  N/A |
| 23%   32C    P8     9W / 250W |     10MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

look at the annotations

transform between different types

to_tensor = transforms.ToTensor()
to_pil = transforms.ToPILImage()
rgbImage = Image.open(imgNames[0])
lblImage = Image.open(lblNames[0])
dsmImage = Image.open(dsmNames[0])
rgbTensor = image2tensor(rgbImage)
lblTensor = image2tensor(lblImage)
dsmTensor = image2tensor(dsmImage)
type(lblTensor)
rgbTensor.shape
lblTensor.shape
dsmTensor.shape
torch.unique(lblTensor)
lblTensor.shape
torch.Tensor
torch.Size([3, 6000, 6000])
torch.Size([3, 6000, 6000])
torch.Size([1, 6000, 6000])
tensor([  0, 255], dtype=torch.uint8)
torch.Size([3, 6000, 6000])
# pay attention to the dimension, different software package will always leading to different dimensions of the image:
# PIL:(C,H,W)
# SKImage: (H,W,C)
rgbArray = to_np(rgbTensor).transpose(1,2,0)
lblArray = to_np(lblTensor).transpose(1,2,0)
dsmArray = to_np(dsmTensor).transpose(1,2,0)
np.unique(lblArray)
array([  0, 255], dtype=uint8)

original image is composed of 0 and 255, we need to have a change, turn to the grad scale image with constant values

rgbArray.shape
type(rgbArray)
(6000, 6000, 3)
numpy.ndarray

grayscale images

palette = {0 : (255, 255, 255), # Impervious surfaces (white)
           1 : (0, 0, 255),     # Buildings (blue)
           2 : (0, 255, 255),   # Low vegetation (cyan)
           3 : (0, 255, 0),     # Trees (green)
           4 : (255, 255, 0),   # Cars (yellow)
           5 : (255, 0, 0),     # Clutter (red)
           6 : (0, 0, 0)}       # Undefined (black)
invert_palette = {v: k for k, v in palette.items()}
def convert_to_color(arr_2d, palette=palette):
    """ Numeric labels to RGB-color encoding """
    arr_3d = np.zeros((arr_2d.shape[0], arr_2d.shape[1], 3), dtype=np.uint8)

    for c, i in palette.items():
        m = arr_2d == c
        arr_3d[m] = i

    return arr_3d

# original label is RGB, we need to have a grayscale label
def convert_from_color(arr_3d, palette=invert_palette):
    """ RGB-color encoding to grayscale labels """
    arr_2d = np.zeros((arr_3d.shape[0], arr_3d.shape[1]), dtype=np.uint8)

    for c, i in palette.items():
        m = np.all(arr_3d == np.array(c).reshape(1, 1, 3), axis=2)
        arr_2d[m] = i

    return arr_2d
np.unique(convert_from_color(lblArray))
array([0, 1, 2, 3, 4, 5], dtype=uint8)
transformedArray  = convert_from_color(lblArray)
show_image(lblImage)
show_image(transformedArray,cmap='gray')
<AxesSubplot:>
<AxesSubplot:>
type(transformedArray)
numpy.ndarray

turn to the array to grayscale image, type is "P"

paletteValue = list(palette.values())
paletteValue
[(255, 255, 255),
 (0, 0, 255),
 (0, 255, 255),
 (0, 255, 0),
 (255, 255, 0),
 (255, 0, 0),
 (0, 0, 0)]
temp = Image.fromarray(transformedArray).convert('P')
temp.putpalette(np.array(paletteValue, dtype=np.uint8))
# same as the array has shown!
temp
type(temp)
np.unique(temp)
PIL.Image.Image
array([0, 1, 2, 3, 4, 5], dtype=uint8)

transform the whole dataset

paletteValue
[(255, 255, 255),
 (0, 0, 255),
 (0, 255, 255),
 (0, 255, 0),
 (255, 255, 0),
 (255, 0, 0),
 (0, 0, 0)]
len(lblNames)
24
lblNames[0].parent
Path('/home/ubuntu/sharedData/swp/dlLab/fastaiRepository/fastai/data/rsData/kaggleOriginal/Potsdam/5_labels_for_participants')
lblNames[0]
Path('/home/ubuntu/sharedData/swp/dlLab/fastaiRepository/fastai/data/rsData/kaggleOriginal/Potsdam/5_labels_for_participants/top_potsdam_2_11_label.tif')
print(f'label shape using PIL to read is {lblImage.shape}')
label shape using PIL to read is (6000, 6000)
from skimage import io

print(f'label image shape using skimage to read is {io.imread(lblNames[0]).shape}')
# pay attention to these, using io to read an image is different from PIL,
# PIL shape(6000,6000), io shape (6000,6000,3)
temp = np.asarray(convert_from_color(io.imread(lblNames[0])),dtype='int64')
print(f'transformed label shape is {temp.shape}')
print(f'label has {np.unique(temp)} grayscale values')
label image shape using skimage to read is (6000, 6000, 3)
transformed label shape is (6000, 6000)
label has [0 1 2 3 4 5] grayscale values
type(temp)
numpy.ndarray
temp
array([[2, 2, 2, ..., 0, 0, 0],
       [2, 2, 2, ..., 0, 0, 0],
       [2, 2, 2, ..., 0, 0, 0],
       ...,
       [1, 1, 1, ..., 2, 2, 2],
       [1, 1, 1, ..., 2, 2, 2],
       [1, 1, 1, ..., 2, 2, 2]])
temp.shape
(6000, 6000)
convert_to_color(temp).shape
(6000, 6000, 3)
show_image(convert_to_color(temp))
<AxesSubplot:>
tempImage = Image.fromarray(np.uint8(temp)).convert('P')
tempImage
tempImage.putpalette(np.array(paletteValue, dtype='int64'))
tempImage

shapes are different, so this will decide how we will convert the RGB label to the grayscale label

# turn the images in the annotations to grayscale
classes = ['Impervious surface','Buildings','Low vegetation','Trees','Cars','Clutter','Background']
def turnDataset2Gray():
    for index in range(len(lblNames)):
    lblImage = Image.open(lblNames[index])
    lblTensor= image2tensor(lblImage)
    lblArray = to_np(lblTensor).transpose(1,2,0)
    transformedArray  = convert_from_color(lblArray)
    temp = Image.fromarray(transformedArray).convert('P')
    temp.putpalette(np.array(paletteValue, dtype=np.uint8))
    temp.save(lblNames[index].parent/f'{lblNames[index].stem}.tif')
    print(f'{lblNames[index].stem} saved')

# turnDataset2Gray()
top_potsdam_2_11_label saved
top_potsdam_2_10_label saved
top_potsdam_6_8_label saved
top_potsdam_4_11_label saved
top_potsdam_4_12_label saved
top_potsdam_3_10_label saved
top_potsdam_6_11_label saved
top_potsdam_7_11_label saved
top_potsdam_3_11_label saved
top_potsdam_5_10_label saved
top_potsdam_6_9_label saved
top_potsdam_5_11_label saved
top_potsdam_2_12_label saved
top_potsdam_6_12_label saved
top_potsdam_7_9_label saved
top_potsdam_6_10_label saved
top_potsdam_6_7_label saved
top_potsdam_7_8_label saved
top_potsdam_7_10_label saved
top_potsdam_7_12_label saved
top_potsdam_5_12_label saved
top_potsdam_3_12_label saved
top_potsdam_2_11_label saved
top_potsdam_4_10_label saved
import os.path as osp
import numpy as np
from PIL import Image
# convert dataset annotation to semantic segmentation map
data_root = 'iccv09Data'
img_dir = 'images'
ann_dir = 'labels'
osp.join(data_root,ann_dir)
'iccv09Data/labels'
classes
['Impervious surface',
 'Buildings',
 'Low vegetation',
 'Trees',
 'Cars',
 'Clutter',
 'Background']
# Let's take a look at the segmentation map we got
import matplotlib.patches as mpatches
img = Image.open(lblNames[0])
plt.figure(figsize=(8, 6))
img
# im = plt.imshow(convert_to_color(np.array(img)))
<Figure size 576x432 with 0 Axes>
<Figure size 576x432 with 0 Axes>
img.shape
(6000, 6000)
np.unique(np.array(img))
array([0, 6], dtype=uint8)
np.unique(convert_to_color(np.array(img)))
array([  0, 255], dtype=uint8)
test = convert_to_color(np.array(img))
test.shape
show_image(test)
(6000, 6000, 3)
<AxesSubplot:>
convert_to_color(np.array(img))
array([[[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       ...,

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]]], dtype=uint8)
# create a patch (proxy artist) for every color 
patches = [mpatches.Patch(color=np.array(paletteValue[i])/255., 
                          label=classes[i]) for i in range(7)]
# put those patched as legend-handles into the legend
plt.legend(handles=patches, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0., 
           fontsize='large')
plt.axis('off')
plt.show()
<Figure size 576x432 with 0 Axes>
<matplotlib.legend.Legend at 0x7f2802161580>
(-0.5, 5999.5, 5999.5, -0.5)
# split train/val set randomly
split_dir = 'splits'
mmcv.mkdir_or_exist(osp.join(data_root, split_dir))
filename_list = [osp.splitext(filename)[0] for filename in mmcv.scandir(
    osp.join(data_root, ann_dir), suffix='.png')]
len(filename_list)
715
with open(osp.join(data_root, split_dir, 'train.txt'), 'w') as f:
  # select first 4/5 as train set
  train_length = int(len(filename_list)*4/5)
  f.writelines(line + '\n' for line in filename_list[:train_length])
with open(osp.join(data_root, split_dir, 'val.txt'), 'w') as f:
  # select last 1/5 as train set
  f.writelines(line + '\n' for line in filename_list[train_length:])

After downloading the data, we need to implement load_annotations function in the new dataset class StandfordBackgroundDataset.

palette# classes number, 8 in total
[[128, 128, 128],
 [129, 127, 38],
 [120, 69, 125],
 [53, 125, 34],
 [0, 11, 123],
 [118, 20, 12],
 [122, 81, 25],
 [241, 134, 51]]
from mmseg.datasets.builder import DATASETS
from mmseg.datasets.custom import CustomDataset

@DATASETS.register_module()
class StandfordBackgroundDataset(CustomDataset):
  CLASSES = classes
  PALETTE = palette
  def __init__(self, split, **kwargs):
    super().__init__(img_suffix='.jpg', seg_map_suffix='.png', 
                     split=split, **kwargs)
    assert osp.exists(self.img_dir) and self.split is not None

    

Create a config file

In the next step, we need to modify the config for the training. To accelerate the process, we finetune the model from trained weights.

from mmcv import Config
cfg = Config.fromfile('../configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py')

Since the given config is used to train PSPNet on cityscapes dataset, we need to modify it accordingly for our new dataset.

from mmseg.apis import set_random_seed

# Since we use ony one GPU, BN is used instead of SyncBN
cfg.norm_cfg = dict(type='BN', requires_grad=True)
cfg.model.backbone.norm_cfg = cfg.norm_cfg
cfg.model.decode_head.norm_cfg = cfg.norm_cfg
cfg.model.auxiliary_head.norm_cfg = cfg.norm_cfg
# modify num classes of the model in decode/auxiliary head
cfg.model.decode_head.num_classes = 8
cfg.model.auxiliary_head.num_classes = 8

# Modify dataset type and path
cfg.dataset_type = 'StandfordBackgroundDataset'
cfg.data_root = data_root

cfg.data.samples_per_gpu = 8
cfg.data.workers_per_gpu=8

cfg.img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
cfg.crop_size = (256, 256)
cfg.train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(type='Resize', img_scale=(320, 240), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=cfg.crop_size, cat_max_ratio=0.75),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(type='Normalize', **cfg.img_norm_cfg),
    dict(type='Pad', size=cfg.crop_size, pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
]

cfg.test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(320, 240),
        # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **cfg.img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]


cfg.data.train.type = cfg.dataset_type
cfg.data.train.data_root = cfg.data_root
cfg.data.train.img_dir = img_dir
cfg.data.train.ann_dir = ann_dir
cfg.data.train.pipeline = cfg.train_pipeline
cfg.data.train.split = 'splits/train.txt'

cfg.data.val.type = cfg.dataset_type
cfg.data.val.data_root = cfg.data_root
cfg.data.val.img_dir = img_dir
cfg.data.val.ann_dir = ann_dir
cfg.data.val.pipeline = cfg.test_pipeline
cfg.data.val.split = 'splits/val.txt'

cfg.data.test.type = cfg.dataset_type
cfg.data.test.data_root = cfg.data_root
cfg.data.test.img_dir = img_dir
cfg.data.test.ann_dir = ann_dir
cfg.data.test.pipeline = cfg.test_pipeline
cfg.data.test.split = 'splits/val.txt'

# We can still use the pre-trained Mask RCNN model though we do not need to
# use the mask branch
cfg.load_from = 'checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'

# Set up working dir to save files and logs.
cfg.work_dir = './work_dirs/tutorial'

cfg.runner.max_iters = 200
cfg.log_config.interval = 10
cfg.evaluation.interval = 200
cfg.checkpoint_config.interval = 200

# Set seed to facitate reproducing the result
cfg.seed = 0
set_random_seed(0, deterministic=False)
cfg.gpu_ids = range(1)

# Let's have a look at the final config used for training
print(f'Config:\n{cfg.pretty_text}')
Config:
norm_cfg = dict(type='BN', requires_grad=True)
model = dict(
    type='EncoderDecoder',
    pretrained='open-mmlab://resnet50_v1c',
    backbone=dict(
        type='ResNetV1c',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        dilations=(1, 1, 2, 4),
        strides=(1, 2, 1, 1),
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=False,
        style='pytorch',
        contract_dilation=True),
    decode_head=dict(
        type='PSPHead',
        in_channels=2048,
        in_index=3,
        channels=512,
        pool_scales=(1, 2, 3, 6),
        dropout_ratio=0.1,
        num_classes=8,
        norm_cfg=dict(type='BN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    auxiliary_head=dict(
        type='FCNHead',
        in_channels=1024,
        in_index=2,
        channels=256,
        num_convs=1,
        concat_input=False,
        dropout_ratio=0.1,
        num_classes=8,
        norm_cfg=dict(type='BN', requires_grad=True),
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
    train_cfg=dict(),
    test_cfg=dict(mode='whole'))
dataset_type = 'StandfordBackgroundDataset'
data_root = 'iccv09Data'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (256, 256)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(type='Resize', img_scale=(320, 240), ratio_range=(0.5, 2.0)),
    dict(type='RandomCrop', crop_size=(256, 256), cat_max_ratio=0.75),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size=(256, 256), pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(320, 240),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=8,
    workers_per_gpu=8,
    train=dict(
        type='StandfordBackgroundDataset',
        data_root='iccv09Data',
        img_dir='images',
        ann_dir='labels',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations'),
            dict(type='Resize', img_scale=(320, 240), ratio_range=(0.5, 2.0)),
            dict(type='RandomCrop', crop_size=(256, 256), cat_max_ratio=0.75),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(type='PhotoMetricDistortion'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size=(256, 256), pad_val=0, seg_pad_val=255),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_semantic_seg'])
        ],
        split='splits/train.txt'),
    val=dict(
        type='StandfordBackgroundDataset',
        data_root='iccv09Data',
        img_dir='images',
        ann_dir='labels',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(320, 240),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        split='splits/val.txt'),
    test=dict(
        type='StandfordBackgroundDataset',
        data_root='iccv09Data',
        img_dir='images',
        ann_dir='labels',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(320, 240),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ],
        split='splits/val.txt'))
log_config = dict(
    interval=10, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = 'checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
optimizer_config = dict()
lr_config = dict(policy='poly', power=0.9, min_lr=0.0001, by_epoch=False)
runner = dict(type='IterBasedRunner', max_iters=200)
checkpoint_config = dict(by_epoch=False, interval=200)
evaluation = dict(interval=200, metric='mIoU', pre_eval=True)
work_dir = './work_dirs/tutorial'
seed = 0
gpu_ids = range(0, 1)

Train and Evaluation

from mmseg.datasets import build_dataset
from mmseg.models import build_segmentor
from mmseg.apis import train_segmentor


# Build the dataset
datasets = [build_dataset(cfg.data.train)]

# Build the detector
model = build_segmentor(
    cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))
# Add an attribute for visualization convenience
model.CLASSES = datasets[0].CLASSES

# Create work_dir
mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
train_segmentor(model, datasets, cfg, distributed=False, validate=True, 
                meta=dict())
/home/ubuntu/miniconda3/envs/new/lib/python3.8/site-packages/mmcv/utils/misc.py:323: UserWarning: "flip_ratio" is deprecated in `RandomFlip.__init__`, please use "prob" instead
  warnings.warn(
2021-11-18 03:54:49,540 - mmseg - INFO - Loaded 572 images
/home/ubuntu/miniconda3/envs/new/lib/python3.8/site-packages/mmseg/models/backbones/resnet.py:431: UserWarning: DeprecationWarning: pretrained is a deprecated, please use "init_cfg" instead
  warnings.warn('DeprecationWarning: pretrained is a deprecated, '
2021-11-18 03:54:50,369 - mmseg - INFO - Loaded 143 images
2021-11-18 03:54:50,370 - mmseg - INFO - load checkpoint from checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth
2021-11-18 03:54:50,371 - mmseg - INFO - Use load_from_local loader
2021-11-18 03:54:50,504 - mmseg - WARNING - The model and loaded state dict do not match exactly

size mismatch for decode_head.conv_seg.weight: copying a param with shape torch.Size([19, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([8, 512, 1, 1]).
size mismatch for decode_head.conv_seg.bias: copying a param with shape torch.Size([19]) from checkpoint, the shape in current model is torch.Size([8]).
size mismatch for auxiliary_head.conv_seg.weight: copying a param with shape torch.Size([19, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([8, 256, 1, 1]).
size mismatch for auxiliary_head.conv_seg.bias: copying a param with shape torch.Size([19]) from checkpoint, the shape in current model is torch.Size([8]).
2021-11-18 03:54:50,508 - mmseg - INFO - Start running, host: ubuntu@3a26d25f88f6, work_dir: /home/ubuntu/sharedData/swp/dlLabSwp/favourite/swpFastTest/mmsegmentation/demo/work_dirs/tutorial
2021-11-18 03:54:50,509 - mmseg - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) PolyLrUpdaterHook                  
(NORMAL      ) CheckpointHook                     
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) PolyLrUpdaterHook                  
(LOW         ) IterTimerHook                      
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_iter:
(VERY_HIGH   ) PolyLrUpdaterHook                  
(LOW         ) IterTimerHook                      
(LOW         ) EvalHook                           
 -------------------- 
after_train_iter:
(ABOVE_NORMAL) OptimizerHook                      
(NORMAL      ) CheckpointHook                     
(LOW         ) IterTimerHook                      
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) CheckpointHook                     
(LOW         ) EvalHook                           
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_epoch:
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_epoch:
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
2021-11-18 03:54:50,509 - mmseg - INFO - workflow: [('train', 1)], max: 200 iters
2021-11-18 03:54:58,809 - mmseg - INFO - Iter [10/200]	lr: 9.598e-03, eta: 0:02:31, time: 0.795, data_time: 0.011, memory: 3521, decode.loss_ce: 1.5417, decode.acc_seg: 44.9363, aux.loss_ce: 0.6946, aux.acc_seg: 28.9168, loss: 2.2364
2021-11-18 03:55:06,746 - mmseg - INFO - Iter [20/200]	lr: 9.149e-03, eta: 0:02:22, time: 0.793, data_time: 0.005, memory: 3521, decode.loss_ce: 0.9195, decode.acc_seg: 65.4073, aux.loss_ce: 0.5320, aux.acc_seg: 61.0668, loss: 1.4515
2021-11-18 03:55:14,863 - mmseg - INFO - Iter [30/200]	lr: 8.698e-03, eta: 0:02:16, time: 0.812, data_time: 0.005, memory: 3521, decode.loss_ce: 0.6681, decode.acc_seg: 64.2869, aux.loss_ce: 0.3623, aux.acc_seg: 63.7490, loss: 1.0304
2021-11-18 03:55:22,854 - mmseg - INFO - Iter [40/200]	lr: 8.244e-03, eta: 0:02:07, time: 0.799, data_time: 0.005, memory: 3521, decode.loss_ce: 0.5276, decode.acc_seg: 71.3276, aux.loss_ce: 0.2769, aux.acc_seg: 69.5687, loss: 0.8045
2021-11-18 03:55:30,905 - mmseg - INFO - Iter [50/200]	lr: 7.788e-03, eta: 0:02:00, time: 0.805, data_time: 0.005, memory: 3521, decode.loss_ce: 0.5736, decode.acc_seg: 65.9003, aux.loss_ce: 0.2798, aux.acc_seg: 63.7514, loss: 0.8534
2021-11-18 03:55:39,100 - mmseg - INFO - Iter [60/200]	lr: 7.328e-03, eta: 0:01:52, time: 0.820, data_time: 0.005, memory: 3521, decode.loss_ce: 0.6400, decode.acc_seg: 65.1311, aux.loss_ce: 0.2979, aux.acc_seg: 62.5127, loss: 0.9379
2021-11-18 03:55:47,147 - mmseg - INFO - Iter [70/200]	lr: 6.865e-03, eta: 0:01:44, time: 0.805, data_time: 0.005, memory: 3521, decode.loss_ce: 0.5808, decode.acc_seg: 68.3269, aux.loss_ce: 0.2635, aux.acc_seg: 67.6055, loss: 0.8444
2021-11-18 03:55:55,779 - mmseg - INFO - Iter [80/200]	lr: 6.398e-03, eta: 0:01:37, time: 0.863, data_time: 0.052, memory: 3521, decode.loss_ce: 0.5458, decode.acc_seg: 72.6513, aux.loss_ce: 0.2604, aux.acc_seg: 69.9083, loss: 0.8062
2021-11-18 03:56:04,009 - mmseg - INFO - Iter [90/200]	lr: 5.928e-03, eta: 0:01:29, time: 0.823, data_time: 0.005, memory: 3521, decode.loss_ce: 0.5700, decode.acc_seg: 71.2091, aux.loss_ce: 0.2743, aux.acc_seg: 68.0706, loss: 0.8443
2021-11-18 03:56:12,254 - mmseg - INFO - Iter [100/200]	lr: 5.453e-03, eta: 0:01:21, time: 0.824, data_time: 0.005, memory: 3521, decode.loss_ce: 0.5235, decode.acc_seg: 69.6885, aux.loss_ce: 0.2440, aux.acc_seg: 67.5897, loss: 0.7675
2021-11-18 03:56:20,975 - mmseg - INFO - Iter [110/200]	lr: 4.974e-03, eta: 0:01:13, time: 0.872, data_time: 0.005, memory: 3521, decode.loss_ce: 0.5104, decode.acc_seg: 66.6926, aux.loss_ce: 0.2414, aux.acc_seg: 64.4547, loss: 0.7517
2021-11-18 03:56:29,444 - mmseg - INFO - Iter [120/200]	lr: 4.489e-03, eta: 0:01:05, time: 0.847, data_time: 0.006, memory: 3521, decode.loss_ce: 0.4895, decode.acc_seg: 70.4047, aux.loss_ce: 0.2343, aux.acc_seg: 68.3340, loss: 0.7238
2021-11-18 03:56:37,687 - mmseg - INFO - Iter [130/200]	lr: 3.998e-03, eta: 0:00:57, time: 0.824, data_time: 0.005, memory: 3521, decode.loss_ce: 0.4159, decode.acc_seg: 75.1794, aux.loss_ce: 0.1998, aux.acc_seg: 73.5712, loss: 0.6158
2021-11-18 03:56:46,107 - mmseg - INFO - Iter [140/200]	lr: 3.500e-03, eta: 0:00:49, time: 0.842, data_time: 0.006, memory: 3521, decode.loss_ce: 0.4812, decode.acc_seg: 72.7432, aux.loss_ce: 0.2276, aux.acc_seg: 70.1527, loss: 0.7088
2021-11-18 03:56:54,881 - mmseg - INFO - Iter [150/200]	lr: 2.994e-03, eta: 0:00:41, time: 0.877, data_time: 0.053, memory: 3521, decode.loss_ce: 0.4600, decode.acc_seg: 73.2548, aux.loss_ce: 0.2293, aux.acc_seg: 70.2939, loss: 0.6893
2021-11-18 03:57:03,280 - mmseg - INFO - Iter [160/200]	lr: 2.478e-03, eta: 0:00:33, time: 0.840, data_time: 0.006, memory: 3521, decode.loss_ce: 0.4546, decode.acc_seg: 77.5512, aux.loss_ce: 0.2281, aux.acc_seg: 74.4491, loss: 0.6826
2021-11-18 03:57:11,813 - mmseg - INFO - Iter [170/200]	lr: 1.949e-03, eta: 0:00:24, time: 0.853, data_time: 0.005, memory: 3521, decode.loss_ce: 0.4950, decode.acc_seg: 68.7967, aux.loss_ce: 0.2458, aux.acc_seg: 65.7474, loss: 0.7408
2021-11-18 03:57:20,300 - mmseg - INFO - Iter [180/200]	lr: 1.402e-03, eta: 0:00:16, time: 0.849, data_time: 0.005, memory: 3521, decode.loss_ce: 0.4328, decode.acc_seg: 71.6784, aux.loss_ce: 0.2036, aux.acc_seg: 70.5171, loss: 0.6364
2021-11-18 03:57:28,797 - mmseg - INFO - Iter [190/200]	lr: 8.277e-04, eta: 0:00:08, time: 0.849, data_time: 0.005, memory: 3521, decode.loss_ce: 0.4477, decode.acc_seg: 74.6477, aux.loss_ce: 0.2182, aux.acc_seg: 72.8090, loss: 0.6659
2021-11-18 03:57:37,211 - mmseg - INFO - Saving checkpoint at 200 iterations
2021-11-18 03:57:38,067 - mmseg - INFO - Iter [200/200]	lr: 1.841e-04, eta: 0:00:00, time: 0.930, data_time: 0.005, memory: 3521, decode.loss_ce: 0.3921, decode.acc_seg: 73.7793, aux.loss_ce: 0.2060, aux.acc_seg: 70.2609, loss: 0.5981
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 143/143, 27.1 task/s, elapsed: 5s, ETA:     0s
2021-11-18 03:57:43,412 - mmseg - INFO - per class results:
2021-11-18 03:57:43,414 - mmseg - INFO - 
+--------+-------+-------+
| Class  |  IoU  |  Acc  |
+--------+-------+-------+
|  sky   |  88.8 | 93.79 |
|  tree  | 72.41 | 84.84 |
|  road  | 87.26 | 92.57 |
| grass  | 74.83 |  90.8 |
| water  | 69.31 | 87.65 |
|  bldg  | 79.14 | 87.31 |
|  mntn  | 26.12 | 29.31 |
| fg obj | 67.55 |  81.0 |
+--------+-------+-------+
2021-11-18 03:57:43,414 - mmseg - INFO - Summary:
2021-11-18 03:57:43,416 - mmseg - INFO - 
+-------+-------+-------+
|  aAcc |  mIoU |  mAcc |
+-------+-------+-------+
| 88.22 | 70.68 | 80.91 |
+-------+-------+-------+
2021-11-18 03:57:43,417 - mmseg - INFO - Iter(val) [143]	aAcc: 0.8822, mIoU: 0.7068, mAcc: 0.8091, IoU.sky: 0.8880, IoU.tree: 0.7241, IoU.road: 0.8726, IoU.grass: 0.7483, IoU.water: 0.6931, IoU.bldg: 0.7914, IoU.mntn: 0.2612, IoU.fg obj: 0.6755, Acc.sky: 0.9379, Acc.tree: 0.8484, Acc.road: 0.9257, Acc.grass: 0.9080, Acc.water: 0.8765, Acc.bldg: 0.8731, Acc.mntn: 0.2931, Acc.fg obj: 0.8100

Inference with trained model

img = mmcv.imread('iccv09Data/images/6000124.jpg')

model.cfg = cfg
result = inference_segmentor(model, img)
plt.figure(figsize=(8, 6))
show_result_pyplot(model, img, result, palette)
<Figure size 576x432 with 0 Axes>
/home/ubuntu/miniconda3/envs/new/lib/python3.8/site-packages/mmseg/models/segmentors/base.py:264: UserWarning: show==False and out_file is not specified, only result image will be returned
  warnings.warn('show==False and out_file is not specified, only '
<Figure size 576x432 with 0 Axes>